Corpus

Column

Introduction

The corpus that is used for this project is the top 50 of seven different countries on Spotify. The specific countries that were selected are: Australia, Brazil, Morocco, the Netherlands, the Philippines, the United Kingdom and the United States. This list of countries was chosen because it contains the most popular top 50 list by number of likes on Spotify of each continent. The Netherlands was added to these most popular top 50 lists because it’s especially interesting to me as someone who lives in the Netherlands. This corpus was chosen to have both a manageable amount of songs, but also represent music from all over the world while being representative of a large chunk of the listeners of each continent. One limitation of this corpus is the fact that the countries that are the most popular in each continent are suspiciously often English-speaking countries which means that English music might be over-represented when compared to the rest of the countries that are not concluded. It also might mean that the playlists that have the most likes are not from the country with the most Spotify listeners on a continent, but that a lot of people from across the world like the top 50 of this country. This could be because English is a very commonly spoken language everywhere. However, this is hard to prove, so the reasonable assumption is that the most-liked songs are from the countries that listen to Spotify the most. Despite these limitations this website will attempt to find both regional differences and universal trends in popular music by comparing the different top 50 lists. It will also highlight some interesting outliers that are different from the vast majority of popular songs in this large corpus and dive into what makes these outliers popular despite them being different. By doing all this I hope to find out what “ingredients” a popular song is made up of in general and what “ingredients” of popular music are region-dependent, and also when these “ingredients” can be ignored or changed

Outlier analysis

Row

Release date of the songs

Plot A

This plot shows the release date of all songs in the top 50 of all the different regions, I expected to see mostly very new songs with maybe a few outliers. However, surprisingly a lot of songs are from before 2022, some are even as old as before 2000! This plot also shows some regional differences, the Philippines and the UK seem to enjoy older songs the most, with multiple songs from between 1998 to 2014 making their top 50s, the USA seem to enjoy songs from between 2014 to 2022. Morocco, the Netherlands, Brazil and Australia only seem to like newer songs with a handful of exceptions.

Plot B

This plot shows the relation between energy and valence in music from all the different top 50s. It shows that energy and valence are correlated, but not as much as you might expect. It also shows that nearly every popular song has an energy value of 0.4 or higher, those that do not always have a valence level of ~0.5 or lower. This seems to mean that while every “emotional” value is represented in the different top 50s, there does seem to be a minimal amount of energy needed for a song to become popular. A few regional differences can be observed, first of all, Brazil and the UK seem to enjoy more energetic and happier music than average, while the Philippines is the opposite. The other countries all seem to have a similar distribution in energy and valence.

Row

Plot A

Plot B

Regional differences

Regional differences

When looking for regional differences, it is useful to look at the individual features per country first to try and find trends and how countries differ from the norm and what the norm even is.

Acousticness

Acousticness shows an interesting trend, it seems like all the English-speaking countries and the Netherlands prefer less acoustic songs, Morocco likes acoustic music the most by far and their regional music seems to have more acoustic elements which reflects that preference.

Danceability

It seems like there are no strong regional trends in danceability between countries, Morocco has a slightly higher than average danceability preference and Philippines a slightly below average preference for danceability.

Energy

Brazil, the Netherlands and the UK prefer music with more energy, which might mean that South-America and Europe in general like more energetic music. The Philippines is far below average in the energy level of their music, this seems to be a strong regional preference.

Liveness

It seems like live music is not really all that popular in any country, on average very popular music is nearly always very polished and thus not live. Only Brazil seems to have a significant number of tracks that have a higher value in liveness, which says something about the type of music they prefer. Brazil seems to care less about the polish of typical pop music and they may appreciate the more “real” live sound.

Loudness

This graph is very similar to the energy graph, with the same countries on top and at the bottom. One major difference that can be found is that Brazil has a way higher average than second place, in energy the difference was smaller. This again seems to be regional preference.

Speechiness

This graph is very similar to liveness, popular music usually has little to no speech since that is a feature that just doesn’t occur in very popular songs in general. But again there are exceptions, Morocco and Brazil do seem to have a significant amount off tracks that have some amount of speechiness which says something about the regional music they prefer.

Tempo

It’s very interesting to see that there really seems to be an optimal tempo for a song to become popular regardless of region, only in Brazil is the average preferred tempo a bit higher which makes sense when considering their preferrence for louder and more energetic music than average.

Valence

The average valence seems to be at or slightly below zero, which is surprising to me because you would expect popular music to be happier on average since people like music that makes them happy in general. However, Brazil and Morocco break this trend showing a clear preference for happier songs.

Column

Acousticness

Danceability

Energy

Liveness

Loudness

Speechiness

Tempo

Valence

Individual songs

Row

Description

ADD LATER

Column

Chroma analysis of an outlier

Row

Structure similarity matrix of an outlier

Description

These are two self similarity matrices of “we can’t be friends (wait for your love)” by Ariana Grande, these similarity matrices show both Chroma and Timbre.

Row

Chordogram of a pop song

This chordogram shows the chords in “End of Beginning” by Djo, this song shows six clear sections. This is a typical feature of many popular songs in the corpus, because the sections showing the same chords are the chorus returning three times which has historically been a key feature in creating popular and most importantly, catchy songs.

Chordogram

Row

Tempogram

Not all pop is rhythmically simple

This tempogram shows an outlier from the corpus. The tempogram is of the song “Pink + White” by Frank Ocean, which is not only one of the older songs in the corpus but also one of the very few songs that do not have a clear unchanging tempo all the way through the song. Overall Spotify estimates the tempo of this track at 160 BPM, and most online sources agree with this assessment, however the graph also shows activity at multiple other BPM values. This is really interesting because it suggests that the tempo of the song might not be clear to the listener, which, looking at the other song in the corpus, is not a recipe for a successful song. However, “Pink + White” has clearly defied the odds and has been a massively successful regardless, even charting again years after its initial release date.

Patterns in the Top 50

Description

Using Gower’s distance and average linkage this dendogram and heatmap shows the trends in the Australian top 50. The reason the Australian top 50 is shown here, is that the Australian top 50 seems to be the most “average” top 50 in the corpus based on looking at the regional differences. The heatmap and dendogram doesn’t show a lot of very nuanced pattern, but it does show two major categories that every popular song falls in. They are either relatively energetic and loud or they are relatively lower energy and quieter. The majority of the songs fall in the first category, which is what you would expect in popular music.

Dendogram and heatmap

Conclusion

Column

What does this all mean?